Search CORE

241 research outputs found

RobertNLP at the IWPT 2020 shared task: surprisingly simple enhanced UD parsing for English

Author: Friedrich Annemarie
Grünewald Stefan
Publication venue
Publication date: 01/01/2020
Field of study

This paper presents our system at the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies. Using a biaffine classifier architecture (Dozat and Manning, 2017) which operates directly on finetuned RoBERTa embeddings, our parser generates enhanced UD graphs by predicting the best dependency label (or absence of a dependency) for each pair of tokens in the sentence. We address label sparsity issues by replacing lexical items in relations with placeholders at prediction time, later retrieving them from the parse in a rule-based fashion. In addition, we ensure structural graph constraints using a simple set of heuristics. On the English blind test data, our system achieves a very high parsing accuracy, ranking 1st out of 10 with an ELAS F1 score of 88.94%

OPUS Augsburg

Crossref

Unifying the treatment of preposition-determiner contractions in German universal dependencies treebanks

Author: Friedrich Annemarie
Grünewald Stefan
Publication venue
Publication date: 06/07/2023
Field of study

HDT-UD, the largest German UD treebank by a large margin, as well as the German-LIT treebank, currently do not analyze preposition-determiner contractions such as zum (= zu dem, “to the”) as multi-word tokens, which is inconsistent both with UD guidelines as well as other German UD corpora (GSD and PUD). In this paper, we show that harmonizing corpora with regard to this highly frequent phenomenon using a lookup-table based approach leads to a considerable increase in automatic parsing performance

OPUS Augsburg

Die Honigbiene: ein Modellorganismus der Neurobiologie : Kognition, Krankheiten und die Moleküle des Lernens bei einem sozialen Insekt

Author: Fuchs Stefan
Grünewald Bernd
Schneider Christof
Publication venue
Publication date: 23/04/2009
Field of study

Bienen sind wegen ihres Honigs beliebt und wegen ihrer Bestäubungsleistung wirtschaftlich unverzichtbar. Nicht nur in den Vereinigten Staaten nimmt das Bienensterben allerdings bisweilen dramatische Ausmaße an. Auch unsere heimischen Bienenvölker sind bedroht. Das hat eine Vielzahl von Forschungsprojekten zur Biologie der Biene und zu ihrem Schutz initiiert. Das Institut für Bienenkunde der Polytechnischen Gesellschaft und der Goethe-Universität in Oberursel untersucht in einem integrierten Forschungsansatz die kognitiven Leistungen von Bienen und wie sie durch Krankheit, Stress und Insektizidvergiftungen beeinträchtig werden

Hochschulschriftenserver - Universität Frankfurt am Main

Applying Occam's Razor to Transformer-Based Dependency Parsing: What Works, What Doesn't, and What is Really Necessary

Author: Friedrich Annemarie
Grünewald Stefan
Kuhn Jonas
Publication venue
Publication date: 01/01/2021
Field of study

The introduction of pre-trained transformer-based contextualized word embeddings has led to considerable improvements in the accuracy of graph-based parsers for frameworks such as Universal Dependencies (UD). However, previous works differ in various dimensions, including their choice of pre-trained language models and whether they use LSTM layers. With the aims of disentangling the effects of these choices and identifying a simple yet widely applicable architecture, we introduce STEPS, a new modular graph-based dependency parser. Using STEPS, we perform a series of analyses on the UD corpora of a diverse set of languages. We find that the choice of pre-trained embeddings has by far the greatest impact on parser performance and identify XLM-R as a robust choice across the languages in our study. Adding LSTM layers provides no benefits when using transformer-based embeddings. A multi-task training setup outputting additional UD features may contort results. Taking these insights together, we propose a simple but widely applicable parser architecture and configuration, achieving new state-of-the-art results (in terms of LAS) for 10 out of 12 diverse languages.Comment: 14 pages, 1 figure; camera-ready version for IWPT 202

arXiv.org e-Print Archive

OPUS Augsburg

Coordinate constructions in English enhanced universal dependencies: analysis and computational modeling

Author: Friedrich Annemarie
Grünewald Stefan
Piccirilli Prisca
Publication venue
Publication date: 01/01/2021
Field of study

In this paper, we address the representation of coordinate constructions in Enhanced Universal Dependencies (UD), where relevant dependency links are propagated from conjunction heads to other conjuncts. English treebanks for enhanced UD have been created from gold basic dependencies using a heuristic rule-based converter, which propagates only core arguments. With the aim of determining which set of links should be propagated from a semantic perspective, we create a large-scale dataset of manually edited syntax graphs. We identify several systematic errors in the original data, and propose to also propagate adjuncts. We observe high inter-annotator agreement for this semantic annotation task. Using our new manually verified dataset, we perform the first principled comparison of rule-based and (partially novel) machine-learning based methods for conjunction propagation for English. We show that learning propagation rules is more effective than hand-designing heuristic rules. When using automatic parses, our neural graph-parser based edge predictor outperforms the currently predominant pipelines using a basic-layer tree parser plus converters

arXiv.org e-Print Archive

OPUS Augsburg

A corpus study of creating rule-based enhanced universal dependencies for German

Author: Bürkle Teresa
Friedrich Annemarie
Grünewald Stefan
Publication venue
Publication date: 01/01/2021
Field of study

In this paper, we present a first attempt at enriching German Universal Dependencies (UD) treebanks with enhanced dependencies. Similarly to the converter for English (Schuster and Manning, 2016), we develop a rule-based system for deriving enhanced dependencies from the basic layer, covering three linguistic phenomena: relative clauses, coordination, and raising/control. For quality control, we manually correct or validate a set of 196 sentences, finding that around 90% of added relations are correct. Our data analysis reveals that difficulties arise mainly due to inconsistencies in the basic layer annotations. We show that the English system is in general applicable to German data, but that adapting to the particularities of the German treebanks and language increases precision and recall by up to 10%. Comparing the application of our converter on gold standard dependencies vs. automatic parses, we find that F1 drops by around 10% in the latter setting due to error propagation. Finally, an enhanced UD parser trained on a converted treebank performs poorly when evaluated against our annotations, indicating that more work remains to be done to create gold standard enhanced German treebanks

OPUS Augsburg

RobertNLP at the IWPT 2021 shared task: simple enhanced UD parsing for 17 languages

Author: Friedrich Annemarie
Grünewald Stefan
Oertel Frederik Tobias
Publication venue
Publication date: 01/01/2021
Field of study

This paper presents our multilingual dependency parsing system as used in the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies. Our system consists of an unfactorized biaffine classifier that operates directly on fine-tuned XLM-R embeddings and generates enhanced UD graphs by predicting the best dependency label (or absence of a dependency) for each pair of tokens. To avoid sparsity issues resulting from lexicalized dependency labels, we replace lexical items in relations with placeholders at training and prediction time, later retrieving them from the parse via a hybrid rule-based/machine-learning system. In addition, we utilize model ensembling at prediction time. Our system achieves high parsing accuracy on the blind test data, ranking 3rd out of 9 with an average ELAS F1 score of 86.97

OPUS Augsburg

MiST: a large-scale annotated resource and neural models for functions of modal verbs in English scientific text

Author: Friedrich Annemarie
Grünewald Stefan
Henning Sophie
Macher Nicole
Publication venue
Publication date: 05/07/2023
Field of study

Modal verbs (e.g., can, should or must) occur highly frequently in scientific articles. Decoding their function is not straightforward: they are often used for hedging, but they may also denote abilities and restrictions. Understanding their meaning is important for accurate information extraction from scientific text.To foster research on the usage of modals in this genre, we introduce the MIST (Modals In Scientific Text) dataset, which contains 3737 modal instances in five scientific domains annotated for their semantic, pragmatic, or rhetorical function. We systematically evaluate a set of competitive neural architectures on MIST. Transfer experiments reveal that leveraging non-scientific data is of limited benefit for modeling the distinctions in MIST. Our corpus analysis provides evidence that scientific communities differ in their usage of modal verbs, yet, classifiers trained on scientific data generalize to some extent to unseen scientific domains

OPUS Augsburg